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What's happening in 60 s on the web? 
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APPLICATIONS 

DOWNLOADED 



510,040 

COMMENTS 



98 , 000 + 

TWEETS 



i,wu+ 

READS ON 

Scribd 



13 , 000 + HOURS 

MUSIC 



12 , 000 + 

NEWADS 

POSTED ON 



370 , 000 + MINUTES 



STREAMING ON 

PANDORA 






A/eto 

... . JL Cr-cu&list 

\ 



ACCOUNTS 



POSTS ON 
tumblr. 



100 + 

NEW 

Linked B 
ACCOUNTS 



NEW 



.com 



[60 

SECONDS. 



ARTICLE IS 
PUBLISHED 



CREATED CONTE 



Answers.com 



QUESTIONS 

ASKED ON THE 
INTERNET... 



6,600+ 

NEW 



VaHoO! 



UPLOADEDON 



600 + 

Y 0 r NEW 

Tuho VIDEOS 



UPLOADEDON 

flickr 



50 + 

WordPress 

DOWNLOADS 



^ 70 + 

DOMAINS 

REGISTERED 



60 + \ 

b N lS£ 168 million 694,445 

EMAILS SEARCH 

ARE SENT QUERIES 



695 , 000 + 

facebook 

STATUS 

UPDATES 



1,700+ 

Firefox 

DOWNLOADS 



1 , 500 + 



Go >gle 



GO-G/obe 



79,364 

WALL 

POSTS 



125 + 

PLUGIN 

DOWNLOADS 
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Bla Bla Bla 
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Conversations represent a big chunk 

of this traffic 




© Scan & Target 2007-2010 



4 




Help, Natural Language 
Drocessinsreauired ! 
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U don't got da jack but remember we got da 
screenin 2mro at 8 



| | C vre ke C pa + facil ! G mi 2x + 2 tan a lir C 2 
post en langaj SMS ke 2 posts ekri normleman 




Hexo x ti y xa ti, tu pones las reglas 




Sda7med ya 5ouya Ma chba3tech biiik allah 
ghaleb...nchallah kol 3aam wenti 7ay b5iiir 
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Who is Scan & Target? 
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Scan & Target analyzes digital communications in real 
time to provide actionable intelligence to software 
vendors, brands, service publishers, marketing agencies, 
governments... 



f 



Social 

networks 





Forums, blogs 




E-mails 




Instant Messaging 



Our text Meaning Technology is smart enough to look in real 
time at an incoming text User Generated Content data 

stream, see patterns of interest, and alert the right 
people or trigger the appropriate action-- all without 
being queried 







Customers 
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KRDS m 

FAC EBOOK MARKETING AGENCY 




Ip 

GFCLPE FFUSMA PRESSE 




PtXmania. com 

Gognez sur toutes vos courses 
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Scan & Target technology 
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Unlike solutions based on simple keywords or semantic, our technology 
takes into account the different alterations and variants of 

expressions to analyze the content: 



■ Small/ capital letters use 

■ Letters repetition (vvviiiagrrra for example) 

■ Orthographical variations (vi@gra, vlagra, vl@gra, vl49r4) 

■ Missing letters in some cases (v | agra, v agra...) 

■ Word alteration whatever the use of non alpha symbol (v.i.a.g.r.a, 
v_i°ag#r:a, v-iagra, viagr"a...) 

■ Phonetic alterations 

■ SMS and IM languages 

■ And the combination of these variations 



The solution is available in English and French and Spanish and 
Arabic (MSA + dialects, Arabic alphabet + transliteration). 
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Scan & Target technology 
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The solution is based on a smart engine that rates not just single words 
but the entire content as it passes through the filtering engine. Words 
are therefore placed in context to extract meaning 



The solution applies detailed thematic thesauruses - our Smart 
Wordbooks. Filters are categorized to allow customers to fine-tune the 
analysis (Terrorism/Drugs/Violence, etc.) according to their needs 



Additional analysis layers: sentiment analysis, questions detection... 

Proprietary scoring technology tailored to short digital text contents 

Using a powerful and accurate conditional analysis system, our 
customers experience a very low level of false positives (between 0,05% 
to 0,001% in average) 
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Smuggling 









Big Data? No problem. 
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• For homeland security, our API is distributed using 
IBM hardware (to be hosted on your premises) 



• Thanks to our connector, it's very easy to 
implement our API into your own applications 



• You choose how to display our analysis results into 
your interfaces 

• Capacity to deal in real time with Big Data 

- All of Twitter's traffic (10 TB / day, average 1200 Tweets per 
second)* could be analyzed in real time using one IBM blade center 
(for one language) 

— *Source - Twitter 
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Mass interception issues 
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• Mass interception of digital text 
communications, (OSINT or COMINT like SMS, 
e-mails, IM...) is now technically available 



• Issues for intelligence or law enforcement 
agencies: 

— How to deal with the volume (flow never stops) 

— How to find the needle in the digital haystack 
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"Finding the needle" strategies 



% 

% 
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Benefits 


Identified 

Suspects 


Interception 
on keywords 


Indexation 
and search 


Text 

Meaning j 


Real time 
information 




+ 


- 


+ 


Fuzzy search 


- 


- 


+ 


Advanced analysis 


- 


- 


+ 


+ 


False positive ratio 




- 


+ 


Unknown threat 
detection 


- 




+ 


+ 


Required analyst 
time 




- 


- 


+ 
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Strategies comparison on OSINT 
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Arabic usage 
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Spanish 



© Scan & Target 



Arabic 



65,365,400 



French 



Russian 



Korean 



18.8 % 
17.2% 
42.8 % 
55.2 % 



Top Ten Languages Used in the Web 

( Number of Internet Users by Language ) 



TOP TEH LANGUAGES 
IN THE INTERNET 



Chinese 



German 



Internet Users 
by Language 



536,564,837 



444,948,013 



153,309,074 



99,143,700 



82,548,200 



75,158,584 



Internet 
Penetration 
by Language 



42.0 % 



32.6 % 



36.5 % 



78.2 % 



33.0 % 



78.6 % 



3.3 % 



347.002.991 



Japanese 



Portuguese 



Growth 
in Intern 

(2000 



2,501.2 % 



3.8 % 



95,637,049 



398.2 % 



3.0% 



347,932,305 



1.825.8 % 



3.0% 



139.390.205 



107.1 % 



2.0 % 



71,393.343 



et Users 



World Population 
for this Language 
(2010 Estimate) 



1,277.528.133 



1.365,524.982 



420.469,703 



250.372.925 



126.804.433 




Arabic principles 
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• Arabic is used to describe 3 different forms of the same 
language: 

— Classical Arabic: used in the Qur'an and classical literature 



— Modern Standard Arabic (MSA): 

S no one's native spoken language any more 

S Form of Arabic taught in schools and used in newspapers, books, sermons, TV... 

S The most widely understood type of Arabic used in conversation between 
educated Arabs from different countries 



- Colloquial or Dialectal Arabic: national or regional varieties derived 
from Classical Arabic, which constitute the everyday spoken language 
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Arabic dialects 
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• There are a number of 
Arabic dialects that are 
spoken in the Arabian 
peninsula, North Africa 
and the Middle East; 
most of which largely 
differ from one another 

• Dialects are a mixture of 
the native or indigenous 
languages and Arabic 

• Many of these dialects 

are mutually 
incomprehensible 
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Iraq languages 
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2% 1 % 1 % 
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■ Arabic, Mesopotamian 

■ Arabic, North 
Mesopotamian 

■ Kurdish, Northern 

■ Arabic, Najdi 

■ Azerbaijani, South 

■ Kurdish, Central 
Egyptian Spoken 

■ Farsi, Western 

■ Others 
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Dialects example 
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English Sentence: 


I want 


to drink 


water 


Standard Arabic Transliteration 


Ureedu 


an ashraba 


ma'an 


Egyptian Transliteration: 


Awez 


ashrab 


mayya 


Syrian Transliteration: 


Beddy 


eshrab 


Mayy 


Saudi Transliteration: 


Abgha / Areed 


Ashrab 


Mayyeh 


Moroccan Transliteration: 


Bghit 


Neshrab 


Elma 
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Transliteration 
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• Transliteration is the romanization of Arabic 
- From to Gahwa (Coffee) 



• Problem: written Arabic is normally 
unvocalized , i.e., the vowels are not written 
out, and must be supplied by a reader familiar 
with the language 
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Arabic chat alphabet 



http://scanandtarget.com/ - contact@scanandtarget.com 




• The Arabic chat alphabet (Arabish or Arabizi) is 
used to communicate in the Arabic language over 
the Internet or for sending messages via mobile 
phones when the Arabic alphabet is unavailable 

• Arabic letters are replaced by letters that are 
phonetically equivalent 

• Arabic letters that have no Latin phonetic 
counterpart are represented by numbers, or 
numbers in conjunction with an accent mark 
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Issues with Arabic compared to latin 

languages 
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• Language identification issue: 

- MSA, dialects, mix of languages 

• Transliteration issue (notably for names) 

- ABD AL-WADOUB 

- ABD ELOUADOUD 

- ABD-AL-WADUD 

- ABDEL EL-WADOUD 

• Use of Arabish / Arabizi 

- bri6ania al3o'6ma / britanya al 3ozma = Great Britain 
for example 
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Text meaning mission 
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• To identify and destroy 
terrorist / criminal 
networks, you must 
detect the mistakes / 
errors they will make 



• This is the job of text 
meaning : bringing 
actionable intelligence 
to the analyst for 
investigation 
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New threat detection 
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Contextual 

alerts 



r 




■ JBMA 


r i 


Update alerts 




Target 


triggers 




identification 


L 


A 


1 JHF \ 


L ^ 



Social 

network 

analysis 



Thread 

analysis 



f * 
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Messages vs thread 
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• A web or mobile conversation is a thread of 
messages between 2 or more persons 

• Analysis is first performed at message level for 
contextual alerts 

• When an alert is detected, the associated 
discussion thread is again analyzed to: 

• Increase accuracy and precision 

• Extract investigation elements (names, places, 
nationality, places...) 
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Message identification: paedophilia 




[j iMmnj. l- i -- p va y in z. i i i 



From: Fergus 




PTHC = 

Pre Teen Hard Core 



Age detection 



Multimedia content 
extention detection 
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Thread expansion: paedophilia 
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111 fevrier 2010 17:1 OF Fergus: 


A 

— ' :but 1 can't open (Pthc) KG25-V2 4yo Heidi. rm 




111 fevrier 2010 17 : 1 0 |> Fergus: 


;imlmcjwri format 




111 fevrier 2010 17 : 10 | Micky: 


:you need real player 




[11 fevrier 2010 17:1 OF Fergus: 


:ah 




111 fevrier 2010 1 7 : 1 0 | Micky: 


:it is good 




111 fevrier 2010 1 7 : 1 0 | Micky: 


:kg 25 Regina shows her ass 




111 fevrier 2010 17 : 10 | Micky: 


;ali still critical error :( 




(11 fevrier 2010 17 : 1 0 [ Fergus: 


;yes :( 




111 fevrier 2010 17:1 OF Micky: / 

: / 


(11 fevrier 2010 17:1 OF Fergus: 


'don't worry, it will work tommorow... or Later... 


/ 


111 fevrier 2010 1 7 : 1 0 | Micky: 


m 

:other non pic forum wor ks 


/ 


(11 fevrier 2010 17:1 OF Fergus: 


/ 

• 
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Investigation element 
Forum to be 
investigated 
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Use case: drugs traffic 
detection 






Scan & Target 



• Mass Surveillance of SMS communications (20 to 30 millions per 
day with a lot of different languages, English, Arabic, dialects...) 

• Contextual alerts sent to analysts using conditional analysis 



- Substance related discussions, 

- Transaction related discussions (quantities, money...) 

- Middle men related discussions (dealers, luggage handler, docker, customs...) 

- Smuggling related discussions (places like ports, airports and smuggling tricks) 

• Investigation by analyst (conversation thread analysis, social 
network analysis...) identifies: 

- Dealers' ring (pseudo, IP address...) 

- Coded language detection (use of culinary vocabulary for example) 

• High precision: 40 alerts per million SMS 



on: 
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Recommended solution 
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• Scan & Target text meaning technology is a very efficient 
tool to detect previously unknown terrorist or criminal 
threats on the Internet or wireless networks 



• Main benefits: 

- Ability to deal with huge volumes in real time 

- Multilingual and ability to manage fuzzy languages like IM 
or arabizi 

- Actionable intelligence with message & thread analysis 

- Low level of false positive thanks to advanced analysis 

• To be integrated into your existing monitoring system 
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Contact Information 
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Bastien Hillen, CEO 
[Phone] + 33 6 11 25 53 80 
b.hillen(5)scanandta rget.com 

Scan & Target 

80 rue des haies 
75020 Paris 
France 

www.scanandtarget.com 

www.oorook.com 





